|
Composite classification model learned on multiple isolated subdomains for imbalanced class
JIN Yan, PENG Xinguang
Journal of Computer Applications
2016, 36 (9):
2475-2480.
DOI: 10.11772/j.issn.1001-9081.2016.09.2475
Started with the regional distribution characteristics, a composite classification model learned on multiple isolated subdomains was proposed to further study the class imbalance problem. In the subdomains division stage, each class was described as ultra-small spheres by improved Support Vector Data Description (SVDD) algorithm, then class domain was divided into intensive and sparse domains. Some instances were founded out from the boundaries of classes and composed of class overlapping domains. In the subdomains cleanup stage, according to sample availability parameters related to domain tightness, noise data was cleaned up by improved
K-Nearest Neighbor (
KNN). After combining classifiers sequentially which were learned on isolated subdomains, the Composite Classification model (CCRD) was generated. In the comparison with similar algorithms including SVM (Support Vector Machine),
KNN, C4.5 and MetaCost, CCRD can obviously improve the accuracy of positive instances without increasing mistake of negative instances; in the comparison with SMOTE (Synthetic Minority Over-sampling TEchnique) sampling, CCRD can improve the misjudgement of negative instances without affecting the classification of the positive instances; in the experiments on five datasets, the classification performance of CCRD is also improved, especially in Haberman_sur. Experimental results indicate that the composite classification model learned on multiple isolated subdomains has excellent classification capability, and it is an effective method for inbalanced dataset.
Reference |
Related Articles |
Metrics
|
|